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The  issue  of  customer  aggregation  arises  frequently 


physical  distribution  sys 


single  counodity  models  that  includes  the  classical  transportation  and 


capacitated  facility  location  problems  as  special  cases.  For  any  pro- 
posed aggregation  of  custoswrs , an  4" priori  upper  bound  is  given  on  the 


amount  of  suboptimality  thereby  induced  in  the  model.  This  bound  is  of 


practical  use  because  it  can  provide  a rigorous  justification  of  aggre 


other  natural  criterion.  It  also  suggests  a novel  way  of  using 


standard  clustering  techniques  to  discover  customer  aggregations  with 
small  associated  ■^"priori  error  bounds.  The  analytical  technique  used 


to  derive  these  results  should  prove  useful  for  obtaining  similar 


OO 


CUSTOMER  AGGREGATION  IN  V1STR1BUT10N  MOVE LING 

Modeling  is  often  said  to  be  an  art  rather  than  a science  (4] . One 
reason  is  that  rigorous  justification  is  seldom  available  for  many  of  the 
design  choices  faced  during  the  Modeling  process.  This  paper  attempts  to 
put  a little  more  science  into  a very  ccenonly  occurring  design  question 
in  the  area  of  distribution  Modelings  how  to  aggregate  the  usually  very 
large  number  of  individual  c us toners  into  a more  tractable  number  of 
groups.  The  need  for  such  aggregation  springs  from  the  desire  to  avoid 
excessive  data  development  costs  while  building  a model,  or  an  excessive 
amount  of  computer  tine  or  main  storage  to  solve  it.  It  is  not  at  all  un- 
common in  practical  studies  to  aggregate  several  thousand  customers  down 
to  one  or  two  hundred  demand  zones  on  the  basis  of  geographical  proximity 
and  type  of  customer.  Traditionally  this  has  been  done  and  defended  on 
the  basis  of  common  sense  because  apparently  there  is  no  known  rigorous  and 
practically  feasible  approach  to  this  task.  Our  aim  is  to  remedy  this 
deficiency . 

The  spirit  of  the  present  effort  is  akin  to  that  of  a contemporaneous 
paper  by  the  author  [3]  on  the  subject  of  a priori  error  bounds  for  the 
aggregation  of  procurement  commodities.  Although  the  results  are  of  a sim- 
ilar type,  the  details  and  analytical  techniques  are  in  fact  quite  different. 

1.  Main  Results 

The  following  model  serves  as  the  vehicle  for  our  main  results. 

(1)  Minimize  E d^y^  ♦ I % **  ' \ > 

y,z  M k l 

(2)  subject  to  E y. . • 1 , all  l 

k kl 

W ^ l yM  i \ ‘k  ’ 


(3) 


all  k 


(4) 
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0 < y. . < 1,  *11  ki 


■ 0 i 1 for  all  k and  x e Z. 

itional  intarpratatlon  ia  as  follows t 

k the  possible  {(idUUXLzA  from  which  customers 

can  be  served 

i indexes  the  CuAtomeAA 

y a variable  giving  the  fraction  of  the  annual  needs  of 

customer  i (for  goods  or  services)  satisfied  by  facility  k 
a binary  variable  indicating  whether  facility  k is 
selected  for  use 

d^t  annual  variable  costs  incurred  if  the  full  needs  of  cus- 
tomer £ are  met  from  facility  k 

p^(*)  other  annual  costs  associated  with  facility  k as  a func- 
tion of  its  annual  throughput 

q a quantity  (assumed  >_  0)  measuring  the  annual  needs  of 


customer  1 


vv 


a lower  (upper)  limit  on  the  annual  throughput  permissible 
for  facility  k if  it  is  used 
an  arbitrary  constraint  set  on  z . 


Zt  is  understood  that  a list  of  allowable  (k,t)  links  is  given  to  reflect 
which  candidate  facilities  are  allowed  to  serve  which  customers.  All 
summations  run  only  over  allowable  combinations. 

the  model  as  stated  is  a classical  capacitated  facility  location  pro- 
blem with  possibly  nonlinear  warehouse-related  costs  and  additional  con- 


straints. Mo  assumptions  have  been  made  regarding  the  form  of  the 
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functions  F^,  so  thsy  could  incorporate  an  annual  fixed  cost  associated 
f with  the  use  of  facility  k (which  is  customarily  expressed  as  f^z^)  and 

the  influence  of  economies  or  diseconomies  of  scale.  See,  e.g.,  [2]  for 
a recent  discussion  of  similar  models. 

The  model  need  not  necessarily  involve  facility  location  decisions. 

By  taking  Z to  require  Zy  » 1 for  all  k,  to  be  identically  0 and 
to  be  0 for  all  k,  (1)  - (5)  reduces  to  the  classical  transportation 
problem  (with  the  flow  variables  scaled  by  destination  demands) . 

What  does  one  really  mean  by  "aggregating”  a subset  L of  customers? 

An  important  type  of  customer  aggregation  is  naXa  by  demand,  which 

amounts  to  introducing  the  following  additional  constraints: 

(6)  for  each  k,  the  y^'*  must  be  identical  over  lei. 

An  implicit  assumption,  and  one  that  we  adopt  henceforth,  is 

(7)  for  each  tel,  the  same  kl  links  exist. 

An  obvious  consequence  of  (6)  is  that  it  permits  variables  and  constraints 
to  be  eliminated.  The  net  effect  is  that  the  mathematical  structure  of  (1)- 
(5)  remains  unchanged,  but  with  the  number  of  l indices  reduced  everywhere  by 

1 1 i||  - 1,  where  ||i.||  is  the  number  of  indices  in  L.  An  illustration  is 
presented  in  Sec.  2,  and  subsequently  we  shall  generalize  to  the  case  where 
several  aggregation  sunsets  are  involved  simultaneously. 

Aggregation  clearly  can  be  expected  to  result  in  a model  with  higher 
minimum  cost.  The  question  is  how  much  higher.  Our  main  result  along  these 
lines  is  the  easily  calculated  I priori  bound  given  in  the  following  theorem. 
The  proof  is  in  the  Appendix. 


The  notation  v[*]  stands  for  the  optimal  value  of  any  optimisation 
problem  to  which  it  is  applied. 
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Let  L be  any  subset  of  customers 


satisfying  (7) . Then 


in  advance  of  solving  any  version  of  the  model  — an  upper  bound  on  the 
amount  by  which  any  proposed  cust osier  aggregation  could  diminish  model 
accuracy.  No  longer  is  it  necessary  to  rely  entirely  on  intuition  and 
cumbersome  numerical  experimentation  with  pilot  models. 


2.  A Nxassrlcal  Illustration 


The  above  ideas  will  be  illustrated  maerically  in  miniature  using 


a small  classical  transportation  problem. 

Consider  a firm  with  facilities  in  Seattle,  Los  Angeles  and  Houston, 
and  with  customers  in  Dallas,  Chicago,  Atlanta,  Pittsburgh,  New  York  and 
Boston.  An  aggregation  of  the  northeastern  customers  is  desired  if  this 
oan  be  done  without  introducing  excessive  error.  The  aggregation  analysis 
will  be  conducted  using  approxismte  transportation  coats  based  on  avail- 
able regression  relationships  of  cost  egainst  distance,  this  enables 
transportation  costs  to  be  estimated  inexpensively  by  computer  knowing 
only  the  locational  noordl nates  of  each  origin  and  destination.  After  the 


s 


aggregation  analysis  is  completed,  sera  accurate  transportation  costs 
would  be  developed  for  the  smaller  (aggregated)  model  and  used  thereafter. 

Table  1 lays  out  the  full  transportation  problem  in  a traditional 
format  using  approximate  transportation  cost  data  (taken  here  to  be  pro- 
portional to  distance  at  the  rate  of  one  dollar  per  hundredweight  per 
thousand  miles) . Supplies  and  di .nands  are  given  in  thousands  of  hundred- 
weight. Disregard  the  optimal  solution  shown  for  the  time  being. 


TABLE  It  FULL  TRANSPORTATION  PROBLEM 
(Optimal  solution  shown  in  parentheses;  optimal  value  - $116,005) 

Grouping  Pittsburgh,  New  York  and  Boston  together  is  proposed  as  the 
first  trial  aggregation.  This  leads  to  the  reduced  problem  shorn  in  Table 
2 (again,  disregard  the  optimal  solution  shown).  Notion  that  the  demand  of 
PXT/NY/B06  is  just  the  sum  of  the  individual  demands,  and  that  the  unit 


costa  in  this  oolusm  ars  weighted  pro  rats  daawnd.  For  instance,  the 


2.761  $/CW 


PIT/MY/BOS 


2.078 

2.013 

(10) 

2.618 

2.761 

(5) 

1.387 

2.054 

2.182 

2.733 

(10) 

1 

(15) 

.243 

1.067 

.789 

1.580 

i 

i 

<5, 

(15) 

2.078  I 2.013  I 2.618  I 2.465 


1.387  I 2.054  I 2.182 


.243  I 1.067 


E M A N D 


TABLE  3:  SECOND  * "’IREGATION 


(Optiaal  *0106100  shown  in  p*r*nth***s.  optimal  value  - $116,170) 


Dw  i priori  error  bound  for  this  second  aggregation  ia  mailer: 

E - $200.  To  show  the  calculation  in  detail,  we  first  express  the 

MY, BOS 


general  formula  (9)  in  term  of  unit  transportation  costs  , i.e.,  we 
write  d^  as  to  obtain  an  equivalent  representation 


(9A)  €, 


l*L  1 


2 */*  ck /* 

l'*L  4 ** 

4/1 

l*  cL  4 


-CKt 


Now  putting  L - (mr,B06),  we 


c NY, BOS  * Srr 


' V. S,a,,  (vv*^ - %.«) 

! TIY  ♦ **B0S  J 

* S“  EU. 


- 15,000  max  (2.879  - 2.815,  2.856  - 2.786,  1.686  • 1.608) 
♦ 10,000  MX  (2.879  - 2.976,  2.856  - 2.960,  1.686  - 1.804) 

- (15,000)  (0.078)  ♦ (10,000)  (-0.097)  - $200. 


Am  win?  that  this  mailer  error  bound  ie  deemed  acceptable , the  second 
aggregation  would  be  accepted  as  sufficiently  accurate. 

This  concludes  the  miniature  illustration  of  the  trial-and-error 
process  by  which  different  aggregations  can  be  proposed  and  evaluated  until 
one  is  found  with  an  acceptable  compromise  between  parsimony  and  exposure 
to  modeling  error. 

It  may  also  be  of  interest  too  use  this  illustoratoion  too  exemplify  the 
assertions  of  the  Customer  Aggregation  Theorem  in  somewhat  greater  detail. 
The  optimal  solution  of  the  full  transportation  problem  and  its  two  aggre- 
gations are  shown  in  Tables  1-3.  Evidently  the  inequalities  of  the  theorem 
turn  out  to  be  satisfied « 


optimal  value  of  optimal  value  of  < optimal  value  of  + € 

full  problem  - PIT/NY/BOS  aggregation  - full  problem  PIT/NY/BOS 


Moreover,  by  virtue  of  the  nature  of  pro  rata  demand  aggregation,  the  solu- 
tion to  each  aggregated  problem  can  easily  be  disaggregated  into  a feasible 
solution  to  the  full  problem  with  the  same  cost.  For  instance,  the 
SEA  - NY/BOS  flow  of  5,000  CUT  in  the  second  aggregation  would  be  disaggre- 


gated into 

15  x 5000  - 3000  CUT  to  New  York 
25 


10  X 5000  - 2000  Ctrr  to  Boston 
25 


0 

i 

; 

‘ 
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without  changing  the  associated  transportation  cost  (5000  x 2.879  “ 

3000  x 2.815  + 2000  x 2.976).  The  other  flows  to  NY/BOS  would  be  disaggre- 
gated similarly.  The  resulting  disaggregated  feasible  solution  to  the  full 
problem  is  suboptimal  to  within  its  respective  8 priori  error  bound.  For 
the  second  aggregation,  for  instance,  the  suboptimality  of  $116,170  - 
116,005  ■ $165  is  under  the  & priori  bound  of  $200. 

3.  Sufficient  Condition  for  Zero  Aggregation  Error 

It  is  of  interest  to  examine  the  conditions  under  which  (9)  yields 
■ 0.  One  easy  sufficient  condition  is  as  follows. 

CoKoUbVui.  Suppose  that  the  1 s can  be  written  in  the  factored  form 


(10) 


d^*  q^X^  *or  aXX  k*  w*th  lei 


for  suitable  set  of  X^s.  Then  - 0.  (However,  (10)  is  not  a 

necessary  condition  for  * 0.) 


This  agrees  with  one's  expectation  that  € should  be  0 when  the  cus- 
tomers of  l are  all  situated  close  together  geographically  and  are 
"similar"  in  terms  of  demand  type,  for  then  Xfc  would  have  a natural  in- 
terpretation in  terms  of  the  cost  per  unit  quantity  of  satisfying  the 
needs  of  any  customer  in  l from  facility  k.  More  specifically,  expose 
that  d^  is  composed  of  an  "acquisition"  cost  <*k  $/unit  plus  a transport- 
ation cost  based  on  a rate  8 $/unit-mile.  If  the  distance  from  facility 
k to  any  customer  in  L is  virtually  the  same,  say  d^  miles,  then 

(11)  d^  - qt  («k  ♦ 6 \)  tor  all  ki  with  l e L 


and  (10)  holds  with  Xfc  ■ \ ♦ * \ • 
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4.  Extension  to  Several  Aggregation  Subsets 

A natural  extension  of  the  Customer  Aggregation  Theorem  addresses 
the  case  where  several  subsets  of  customers  are  to  be  aggregated  simul- 
taneously. Let  be  disjoint  subsets  of  customer  indices  such 

that  each  subset  individually  satisfies  (7).  The  analog  of  (6)  is 

(6)'  for  h-l,...,H  and  all  k,  the  y^'s  are  identical  over  l e 

It  is  not  difficult  to  show  that  the  analog  of  (8)  still  holds,  namely 
(8)'  v [problem  (1)  - (5)  ]<_  v [problem  (1) - (5) , (6)  ' ]<_  v[problem  (l)-(5))+€£  , 


where 

A H 

(9) ' € [ - Z Z Max 

h-1  k 


This  is  exactly  the  same  result  as  would  be  obtained  from  H sequen- 
tial applications  of  the  Customer  Aggregation  Theorem  in  its  original  form. 


; 


5.  Extension  to  a Two-Stage  Model 

Another  natural  extension  is  to  the  case  of  a two-stage  distribution 


system. 

The  natural 

two-stage  version  of  problem  (1) — (5)  is: 

(12) 

Minimize 

x,y,z 

Z 

cjkxjk  * l dkiykl  * l Fk"  Vkt'V 

(13) 

sub j . to 

<E,)k<  S,  . .11  J 

(14) 

\ Vj  qiykl  ' *U  k 

(2) 

£ yM  - 1 , .XI  l 

(3) 

** 

*k  - l q«ykl  i Vk'  *u  k 

(15) 

Xjk  1 o , all  jk 

(4), (5) 

0 

<_  y^  1,  all  W i ■ 0,  1 for  all  k ; z c Z . 
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The  following  new  interpretations  are  appropriates 

j indexes  the  sources  which  supply  the  facilities 

a variable  giving  the  annual  amount  of  supplies  obtained 
by  facility  k from  source  j 

c^  unit  cost  of  procurement  or  production  plus  transport- 
ation associated  with  the  flow 

S.  (S . ) a lower  (upper)  limit  on  the  annual  amount  of  supplies 

~D  J 

procured  from  source  j . 

The  interpretation  of  the  two-stage  problem  should  be  evident . constraints 
(14)  amount  to  an  annual  material  balance  requirement  at  each  facility. 

It  is  easy  to  show  that  the  Customer  Aggregation  Theorem  holds 
without  change.  The  only  alteration  needed  in  the  proof  given  in  the 
Appendix  involves  the  addition  of  x^  • x°k  , all  jk,  to  (A4) . 


6.  Designing  Aggregations  by  Clustering 


Up  to  this  point  we  have  taken  the  viewpoint  that  A priori  bounds  are 


useful  for  evaluating  the  comparative  merits  of  alternative  customer  aggre- 
gations on  a case  by  case  basis.  An  obvious  next  step  would  be  to  attempt 


to  automate  the  aggregation  design  process  by  seeking  the  coarsest  possible 


customer  aggregation  with  an  A priori  error  bound  no  larger  than  some  pre- 
spa  rifled  limit. 

This  leads  to  a well-defined  but  exceedingly  difficult  combinatorial 
optimisation  problem.  Fortunately,  truly  optimal  solutions  are  quite 
unnecessary  and  so  it  is  appropriate  to  seek  good  heuristic  techniques. 


I 

The  methods  of  cluster  analysis  [1]  are  attractive  as  a source  of 


heuristic  techniques  for  customer  aggregation . A hierarchial  (that  is, 
parametric)  clustering  approach  starting  with  as  many  clusters  as  original 
customers  and  then  combining  clusters  one  at  a time,  perhaps  with  periodic 
individual  reclassification  of  customers,  appears  to  hold  promise.  The  re- 
sult would  be  an  approximate  tradeoff  curve  between  the  number  of  customer 
clusters  and  the  magnitude  of  the  a priori  bound  given  by  (9) . 

The  chosen  measure  of  association  or  distance  between  two  compatible 
customer  clusters,  say  L ^ and  , plays  a vital  role.  Perhaps  the  most 
natural  measure  is  this  ones 

(16)  d(i  , U4«i  mi  * <«i  ♦ €,  ). 

The  additivity  property  of  € ^ developed  in  Sec.  4 is  essential  in  justi- 
fying this  measure  in  that  it  causes  the  influence  of  all  other  clusters 
to  cancel  out. 

An  attempt  is  currently  under  way  to  develop  and  test  clustering  tech- 
niques based  on  (16)  as  a means  of  at  least  partially  automating  the  cus- 
tomer  aggregation  aspect  of  distribution  model  design. 
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APPENDIX t 

PROOF  OF  THE  CUSTOMER  AGGREGATION  THEOREM 

The  proof  is  based  upon  the  following  fundamental  result. 

Lemma  1 iReA&Uctive  AppKQximation) . Consider  a general  mathematical 
programming  problem 

(P)  Minimise  f(x)  subject  to  x c X 

x 

and  also  the  following  restrictive  approximation  to  it 

(Q)  Minimise  f (x)  subject  to  x e X fi  x , 

x 

where  X and  X ere  both  subsets  of  the  same  set.  Assise  that  X is 
not  empty  end  that  s feaAibxjLLty  Recovery  Rate. 

is  known  which  associates  to  every  point  x°  in  X some  point  x (x°)  in 
X 0 x in  such  a manner  that,  for  some  scalar  , 

(Al)  f (x(x°))  < f(x°)  + €i  for  all  x°  e X. 

Then 

(A 2)  V[P]  < v(Q]  lv(P]  ♦ ^ 

and  every  -optimal  solution  x®  of  (fi)  is  (€^  e^) -optimal  in  (P) , 

where  vf • ] denotes  the  infimal  value  of  problem  [•]• 

Proof.  Me  have 

v(Q]  < Infiaum  f(x(x°))  < v[P]  ♦ €.  * 
x ex 

where  the  first  inequality  follows  from  the  feet  that  x(x°)e  X H x 
and  the  second  inequality  follows  from  (Al).  This,  with  the  evident 
fact  that  v[P)  £ v[Q) , proves  (A2) . The  other  desired  conclusion  re- 
quires demonstrating 


(A3)  f(xfi)  1 V[P]  + €x  + €2  r 

since  xC  clearly  is  fsssibls  in  (P).  This  follows  dirsctly  fro*  (A2)  and 
th*  hypothesis 

f(xQ)  £ v[Q]  ♦ €2  . 

Th*  aost  fruitful  applications  of  this  result  are  those  in  which  a 
feasibility  recovery  rule  can  be  devised  with  a small  associated  ^ . 

In  the  context  of  probleai  (l)-(S)  and  its  aggregated  version  (1) - (6) , the 
following  rule  appears  to  be  the  most  natural  one. 

Lemma  2.  Let  (y°,r°)  be  any  feasible  solution  of  problem  (1) - (5) , 
and  let  L be  any  subset  of  demand  points  satisfying  (7) . Make  the 


Constraint  (2)  holds  for  i % L because  th*  y^'s  involved  were  not 
changed.  For  i € L , we  have 
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E ywf  - I5.-J  E 
k **  k ^ k i'eL 


ykV  ' 1 


fcL 


- e q*,  (E  yj.,)  / e qt. 

t'eL  ^ k **  E'eL 


1 . 


where  the  laet  equality  follows  from 

E y°  . - 1 for  all  V . 
k 

This  verifies  that  (2)  holds  for  t e L as  well  as  for  1 / L . To  verify 
(3)  we  need  the  result 

(A7)  E q#  yvf  - Iq^y  f°r  *n  k' 

ft 


which  can  be  seen  as  follows* 


E 

1 


0,  i 


- £ 

1*1 


qi  y vi  * 1 ql  1 . ql,ykA* 
1 M it  L 1 f«l 


e q»  i 

A'el 


- £ 


E ‘k  ykl  + E ,V  ykl*  • 

1*L  ^ W I'eL 


we  remark  that  the  need  for  (A7)  to  hold  furnished  the  primary  motivation 
for  the  choice  (A5) . It  follows  easily  from  (A7)  that  (3)  holds  at 
(7,*) . constraint  (4)  holds  for  y because  (4)  holds  for  y°  and  each  Ck 
is  just  a convex  combination  of  certain  Y^'**  Constraint  (5)  holds  be- 
cause * • s°  . The  very  nature  of  (A4)  implies  that  (6)  holds  at  y. 

This  establishes  that  (y,I)  is  feasible  in  the  aggregated  problem  (l>-<6> 
Mow  subtract  (1)  evaluated  at  (y° «*°)  trcm  <1)  evaluated  at  (y»s). 
The  result  is 


£ 

ki 

icL 


V * ”k  (J  qt  V ' J 'k  (J  Vu'  V 
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" I *»'**-& 

tcL 

ieL  \ t'cL  **' 


- Z 
kt 
IeL 


V’*1  y 


where  the  first  equality  follows  fren  (A7) , the  second  frost  (AS) , and  the 


third  from  a rearrangement  of  te; 


This  d—onst rates  (A6) . 


The  Custoewr  Aggregation  Theoren  follows  easily  frost 


1 and  2 


upon  naking  the  obvious  identifications.  The  one  additional  fact  needed 
is  that  the  difference  expression  (A6)  is  bounded  above  by  ^ as  defined 
in  (9) , since  y°  Bust  satisfy  (2)  and  (4) . 


